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ABSTRACT 

This paper exploits neural networks to provide a fast and automatic way to classify 
lightcurves in massive photometric datasets. As an example, we provide a working neu¬ 
ral network that can distinguish microlensing lightcurves from other forms of variabil¬ 
ity, such as eruptive, pulsating, cataclysmic and eclipsing variable stars. The network 
has five input neurons, a hidden layer of five neurons and one output neuron. The five 
input variables for the network are extracted by spectral analysis from the lightcurve 
datapoints and are optimised for the identification of a single, symmetric, microlensing 
bump. The output of the network is the posterior probability of microlensing. 

The committee of neural networks successfully passes tests on noisy data taken 
by the MACHO collaboration. When used to process ~ 5000 lightcurves on a typical 
tile towards the bulge, the network cleanly identifies the single microlensing event. 
When fed with a sub-sample of 36 lightcurves identified by the MACHO collaboration 
as microlensing, the network corroborates this verdict in the case of 27 events, but 
classifies the remaining 9 events as other forms of variability. For some of these dis¬ 
crepant events, it looks as though there are secondary bumps or the bump is noisy or 
not properly contained. Neural networks naturally allow for the possibility of novelty 
detection - that is, new or unexpected phenomena which we may want to follow up. 
The advantages of neural networks for microlensing rate calculations, as well as the 
future developments of massive variability surveys, are both briefly discussed. 

Key words: gravitational lensing - variable stars - data processing 


1 INTRODUCTION 

Variability in the sky has been known for thousands of years, 
but our understanding of variable sources remains very in¬ 
complete. Some of the most interesting objects in the sky 
are transient. These include supernovae, microlensed stars, 
near-Earth or killer asteroids (which are transient because of 
their exceptionally large proper motions) optical flashes as¬ 
sociated with gamma-ray bursts and stars undergoing short¬ 
lived but key stages of stellar evolution like the helium core 
flash and so on. All these objects are rare. To hunt them 
down in a systematic way means that we must record im¬ 
ages, process the data in real-time (or nearly so), recognise 
the events from their lightcurves and archive them. 

The earliest examples of massive transient astron¬ 
omy searches are the microlensing surveys like MACHO 1 , 
EROS 2 and OGLE 3 . Typically, the surveys monitored 


1 http://wwwmacho.anu.edu.au/ 

2 http://eros.in2p3.fr/ 

3 http://sirius.astrouw.edu.pl/~ogle/ 


~ 5 x 10 6 stars a few times every night over several years 
in the directions of the Galactic Bulge and the Magellanic 
Clouds, yielding ~ 10 10 photometric measurements. Out of 
the ~ 10 5 sources which were variable, the surveys tried 
to identify ~ 10 2 true microlensing events. The selection 
criteria typically involved the imposition of sets of cuts to 
ensure good lightcurve coverage and a steady baseline flux, 
to require a single bump and thus eliminate common forms 
of stellar variability and to require a good a statistical fit 
to the achromatic standard microlensing lightcurve and so 
on. Many of the cuts developed through trial and error, and 
evolved as the experiments progressed (e.g., Alcock et al. 
1997, 2000a). Unambiguous identification of microlensing 
events was sometimes not possible, and the collaborations 
sometimes reported their results in terms of two sets, one 
of high quality events (any lightcurve that was undoubt¬ 
edly microlensing) and one of possible events (any lightcurve 
with a unique peak and a fiat baseline). Sometimes the cuts 
even eliminated interesting events for example, the longest 
ever microlensing event OGLE-1999-BUL-32 was originally 
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missed as its baseline flux was not constant and so failed one 
of the imposed cuts (Mao et al. 2002). 

Additionally, microlensing alert or early warning sys¬ 
tems (e.g., Udalski et al. 1994) work by reducing the number 
of candidates to manageable amounts. Each night’s candi¬ 
dates are individually examined for the onset of microlens¬ 
ing. Even for surveys as large as OGLE II, this worked well. 
However, still larger surveys are planned for the future and 
therefore it becomes important to automate the procedure 
and issue alerts without human intervention. 

The microlensing experiments are of course not the 
only massive photometry searches being conducted by as¬ 
tronomers at the moment. There are also collaborations pri¬ 
marily looking for supernovae (e.g., The Supernovae Cos¬ 
mology Project), optical flashes related to gamma-ray bursts 
(ROTSE) and near-Earth asteroids (NEAT and LINEAR). 
More generally, as Paczynski (2001, 2002) has emphasised, 
the monitoring of the optical sky for variability is likely to 
enjoy a huge resurgence over the coming decade given the 
low cost of robotic telescopes. The very near future will see 
terabyte datasets of lightcurves routinely available to as¬ 
tronomers. Such datasets will contain complete samples of 
variable stars of all types, as well as the very rare objects 
or events which primarily motivate the search. It is a ur¬ 
gent and important problem to automate the classification 
of lightcurves in massive variability surveys. 

This paper argues that new analysis methods based 
on neural networks will enable us to pinpoint and identify 
scarce transient objects in such huge datasets. Our illus¬ 
trative example is the identification of scarce microlensing 
events against the background of variable stars. However, 
we envisage that the applicability of the technique is much 
wider. 


2 MICROLENSING LIGHTCURVES 

At any instant, the probability that a source star in the 
Galaxy shows the microlensing effect is < 10~ 6 . Microlens¬ 
ing events are hugely outnumbered by stellar variability 
which is at least a hundred thousand times more common. 
The lightcurve classification problem is to devise algorithms 
that diagnose the different kinds of variability. For appli¬ 
cations to microlensing, the algorithm must distinguish mi¬ 
crolensing from other sources of variability (whether intrin¬ 
sic or extrinsic). 

Let us assume a single, point-like, dark lens. The mi¬ 
crolensing lightcurve has a characteristic form written down 
by Paczynski (1986). The lightcurve is symmetric and achro¬ 
matic. As the probability of microlensing is so low, the vari¬ 
ability must not repeat. Microlensing is readily distinguished 
from some, but unfortunately not all, forms of stellar vari¬ 
ability. A cautionary history is provide by the fate of the can¬ 
didate event EROS-LMC-2. This was one of the microlens¬ 
ing candidates uncovered by the photographic plate search 
of the first phase of the EROS experiment towards the Large 
Magellanic Cloud (Ansari et al. 1996). Although the source 
star of EROS-LMC-2 was known to be variable at a low 
level (Ansari et al. 1995), nonetheless microlensing seemed 
favoured by the excellent fit of the lightcurve to the data- 
points. However, there was a substantial second bump in the 
lightcurve eight years after the first, and EROS-LMC-2 was 


then discarded as a microlensing candidate (Lasserre et al. 

2000 ). 

The background in microlensing databases is composed 
of periodic variables (e.g., Cepheids, RR Lyrae), eruptive 
variables (e.g., dwarf novae, classical novae), semi-regular 
variables (e.g., bumpers) and the supernovae occurring in 
galaxies behind the source population. Of these, the most 
troublesome in microlensing surveys towards the Magellanic 
Clouds and Andromeda are the bumpers and the novae-like 
objects. Although SNe la have reasonably well-understood 
lightcurves, the same is not true of other types of super¬ 
novae which can mimic microlensing rather well (for exam¬ 
ple, events 22 and 26 of Alcock. et al. 2000a). Long period 
bumpers may be present as single bumps even in 5 seasons 
worth of data and they can be well-fit by the standard mi¬ 
crolensing lightcurve. 

Let us stress that the identification of microlensing 
events remains an awkward - and not fully solved - prob¬ 
lem. For example, it probably lies at the heart of the seeming 
discord between the results of the MACHO and EROS ex¬ 
periments towards the Large Magellanic Cloud (LMC). The 
MACHO group identified between 13 and 17 events towards 
the LMC, whereas the competing EROS group found only 
3 (Alcock et al. 2000a, Lasserre et al. 2000). Although the 
exposure times and field locations between the two exper¬ 
iments do vary, nonetheless the rate found by MACHO is 
at least twice than that found by EROS. This same dis¬ 
parity is also seen in the experiments towards the Glactic 
Bulge, as MACHO find an optical depth to microlensing of 
~ 3.23 x 10 -6 (Alcock et al. 2000b), whereas the EROS value 
is about half of this (Afonso et al 2003). Possible explana¬ 
tions are that the MACHO selection algorithm may be too 
loose (causing contamination with other variable sources), or 
that the EROS selection algorithm may be too harsh (caus¬ 
ing genuine events to be discarded). It is here that neural 
networks may be able to make a decisive contribution. 


3 AN INFORMAL INTRODUCTION TO 
NEURAL NETWORKS 

Neural networks have been used before for pattern recogni¬ 
tion tasks in physics (e.g., Bishop 1995). In particular, they 
are often used in high energy physics experiments as trig¬ 
gers to select interesting events from large datasets (Muller, 
Reinhardt & Strickland 1995, chapter 8). Recent astronom¬ 
ical applications include classification of optical stellar spec¬ 
tra (Bailer-Jones et al. 1997) and galaxy type (Lahav et al. 
1995), object detection in wide field imaging (Andreon et 
al. 2000) and predictions of astronomical time series (e.g., 
Conway 1998, Perdang & Serre 1998). There has also been a 
recent report of preliminary results on automatic lightcurve 
classification by the ROTSE collaboration (Wozniak et al. 
2001). An interesting review of a number of astronomical 
applications is in Storrie-Lombardi & Lahav (1994). 

In a neural network, the neurons are arranged in layers. 
The input data is fed to the bottommost layer. The output 
value emerges from the topmost layer, the intervening layers 
are hidden. The values of the neurons in any layer a,j are 
calculated via 

CLj = WjiZi. ( 1 ) 
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Figure 1. This shows sample lightcurves of different types of fvariability included in the training and validation sets. 


Variable 

Reference 

Eruptive 

Puslating 

Cataclysmic 

Eclipsing 

van Genderen (1995), AAVSO 
Antonello & Morelli (1996), AAVSO 
Hamuy et al. (1996), AAVSO 
Brancewicz, Dworak (1980) 


Table 1. Sources of lightcurves of variable stars. AAVSO is the American Association of Variable Star Observers. 


Here, Wji are the synaptic weights of the jth neuron with re¬ 
spect to the ith neuron and Zi are the activation values. The 
activation value is computed from the value on the neuron 
via an activation function g 


Zi = g{ai). 


( 2 ) 


As an activation function, we use the logistic function 




1 

1 + exp(— a) ’ 


(3) 


which allows us to interpret the outputs of the network as a 
posteriori probabilities (Bishop 1995, chapters 3,6). 

We start with a sequence of input units (the “patterns”) 
for which the desired values of the output (the “targets”) are 
known. This is called the training set. Given the patterns 
and a set of weights, we can construct an error function E 
which quantifies the performance of the network. We want 
to obtain the weights Wji that minimise the error function 
over the training set using a steepest descent scheme. 

We begin with random values for the weights and per¬ 
form a sequence of iterative up-dates using a variant of back- 
propagation as the learning algorithm. The error derivatives 
with respect to the weights are 


8E n 

dwji 


5 n z n 

U J 5 


Sj 


dE n 
daj ’ 


(4) 


where n labels the pattern. Using the chain rule, we obtain 
the back-propagation formula 

Sj = g'(dj) ^ WkjSk, (5) 

k 

which shows how the values of <5" propagate through the net¬ 
work, given the target value. In each iteration, the weights 
are up-dated according to the following rule 

A Wjj = fSjz", (6) 


where r] is the constant learning rate. The sum is performed 
over all the patterns. This is equivalent to the steepest 
descent method of minimizing the error. In practice, we 
use a refinement of this algorithm, called resilient back- 
propagation, which helps to prevent entrapment in local 
minima (see e.g., Bishop 1995, section 7.5.3). 

As the network is converging to a minimum, it it impor¬ 
tant to prevent overtraining. This is done by feeding a dif¬ 
ferent set of patterns (the “validation set”) to the network. 
The errors over the patterns in the training and the valida¬ 
tion sets are separately computed. The training process is 
stopped just before the error in the validation set begins to 
rise. Finally, the performance of the fully trained network 
can be assessed with a third set of patterns (the “test set”). 
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It is important to ensure that the training, validation and 
test sets do not contain any identical patterns. 


4 IMPLEMENTATION 

The experiments described below use 
the Stuttgart Neural Network Simulator 
(‘http://www-ra.informatik.uni-tuebingen.de/SNNS ’)• 

Our network is composed of one input layer, one hidden 
layer and one output layer. The hidden layer is fully 
connected to the input and output layers. There are 5 
neurons in the input layer, 5 neurons in the hidden layer 
and one neuron in the output layer. The value of the output 
neuron gives the probability that the event is microlensing. 
The reason for the choice of 5 input neurons will become 
obvious shortly. 

4.1 The Training and the Validation Sets 

There are three types of lightcurves in the training set - 
simulated microlensing events, variable star lightcurves from 
archival sources, and sample lightcurves from a microlensing 
experiment (in this case, the MACHO experiment). 

Simulated microlensing events are generated by ran¬ 
domly choosing an impact parameter, an Einstein crossing 
time between 7 days and 365 days and a time when the 
event reaches maximum. Random gaussian noise is added 
with a dispersion in the range from 0.1 to 20 % of the max¬ 
imum flux. The lightcurves are sparsely sampled using the 
MACHO sampling. 

Variable stars may be divided into periodic variables 
and eruptive/cataclysmic variables. The former are usually 
easier to distinguish from microlensing than the latter, al¬ 
ways provided more than one period can be detected in the 
sampled datastream. Examples of typical lightcurves for dif¬ 
ferent types of variability are shown in Figure 0 The pe¬ 
riodic variables include pulsating stars (such as Cepheids 
and Miras) and eclipsing stars. Eruptive variables include T 
Tauri, S Doradus and pre-main sequence stars. Cataclysmic 
variables include novae, supernovae and symbiotic variables. 
The relative frequencies with which these stars occur are 
not important in our analysis. All that matters is that the 
gamut of shapes is well-represented in the training set. We 
are therefore interested as much in regular representatives as 
in extreme examples of the lightcurves. Lightcurves for the 
variable stars are selected from the sources listed in Table Q 
For long data sequences, the experimental window is placed 
randomly on the lightcurve. In this way, we ensure that the 
bumps in the lightcurves do not occur in a privileged place. 

Finally, there are lightcurves randomly chosen from the 
MACHO database (specifically, from held 113 towards the 
Bulge). The rationale for this is that instrumental artefacts 
are certainly present in the MACHO lightcurves and it is 
important for the neural network to be able to recognise 
these. 

The training set contains 400 microlensing lightcurves, 
150 stellar variable lightcurves and 200 MACHO lightcurves. 
The validation set contains the same number of lightcurves, 
although the individual representatives are obviously differ¬ 
ent. The test set are the ~ 5000 lightcurves from MACHO 
tile 113.18292 which is part of held 113 towards the Galactic 


bulge. Let us note that - compared with real data from a 
variability survey - microlensing events are over-represented 
in our training and validation sets. The consequence of this 
is that the network will provide more false positives (as the 
prior probability of microlensing is too high). This is highly 
desirable, as the best approach to detecting such an intrinsi¬ 
cally rare phenomenon as microlensing is to force fewer false 
negatives at the expense of more false positives. 

4.2 Pre-Processing 

In many applications, it is both customary and advanta¬ 
geous to pre-process data for feeding to the neural network. 
The main problem with using raw photometry data is the 
curse of dimensionality (see Bishop 1995, chapter 8). The 
simplest way of overcoming this is to extract features of the 
lightcurve and use this as input to the network. Properly im¬ 
plemented, this can lead to a very efficient network, as prior 
knowledge can be incorporated and redundant variables can 
be discarded in the pre-processing. However, there are dan¬ 
gers as well, as important features in the lightcurves can be 
erased. 

The aim of a neural network is not to model the pat¬ 
terns but to model the decision boundary between the pat¬ 
terns. In microlensing surveys, event identification normally 
proceeds by making sequences of cuts, in which case the 
decision boundary is formed by a set of hyperplanes. The 
advantage of a neural network over conventional sequences 
of straight line cuts is that the former offers a better chance 
of describing a complicated decision boundary accurately. 

Microlensing events are characterised by the presence 
of a (iv) single, (iii) symmetric, (ii) positive (i) excursion 
from the baseline. The event itself is characterised by (v) a 
timescale. Motivated by these five features, we extract from 
the lightcurves the following five parameters, which are in¬ 
puts to the neural networks. 

The first xi is the maximum value of the auto¬ 
correlation function. This help to discriminate against noise 
and identify the presence of any signal. The second X 2 is 
calculated as follows. First, we compute the median of the 
flux measurements which gives a good approximation to the 
baseline. We then compute the mean of the datapoints lying 
above and below the median and finally take their ratio. This 
is then mapped on the interval [0.5,1] with the logistic func¬ 
tion. The input X 2 tests for the positiveness of the excursion. 
The third X 3 is the maximum value of the cross-correlation 
function of the lightcurve with the time-reversed lightcurve. 
This provides a test for symmetric events. The fourth X 4 is 
the mean frequency (u) calculated with the power spectrum 
P{u) as a weighting function. For a periodic variable, we 
expect a shift in the weighted mean frequency from zero. 
We compress X4 with the logistic function to lie in the range 
[0.5,1.0]. Finally, the fifth xs is the width of the autocorre¬ 
lation function, as judged by its standard deviation. If the 
event is microlensing, then the width is a rough indication 
of the timescale. 

To motivate this choice of inputs, Figure^ shows the lo¬ 
cations of all the patterns in the validation and training sets. 
The desideratum is that the choice of inputs offers a clear 
separation between microlensing events and other patterns 
in the five dimensional space (xi, ■ • •, £ 5 ). The projections of 
this space onto the principal planes offer grounds for believ- 
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ing this, as there is already good partial separation in some 
of the plots (e.g., xi versus X3) and good evidence for reg¬ 
ularities in others (e.g., xi versus X2). The final proof that 
the choice of inputs is good can, however, only be provided 
by the performance of the network on the test set. 

Note that Figure [5] plots unnormalized input variables; 
however, the neural network uses normalized inputs. Scaling 
of the inputs to numbers of the order of unity is often useful, 
as this means that the network weights also typically take 
values of the same order (Bishop, 1995, chapter 8). Picto- 
rially, this can be thought of as requiring the hyperplanes 
associated with each hidden unit to intersect close to the 
origin and near the center of the datacloud. For each input 
variable, this scaling is done by subtracting the mean and 
dividing by the standard deviation to give the normalized 
inputs. 

So far, we have skirted round the problem of missing 
datapoints. For MACHO data, ~ 10% of the lightcurves 
have gaps of the order of a few days (aside from the 5 month 
gaps when the Galactic bulge is not visible from Australia). 
To compute the correlation functions, the data is treated as 
if it were uniformly sampled. This gives rise to some errors. If 
the typical gap size is much smaller than the event timescale, 
then any errors we have introduced by this procedure will 
be small. If the gap size relative to the timescale is very 
large, then no classification can be plausibly extracted. If 
the gap size is of the same order as the timescale, then the 
experiment needs re-designing. The input most sensitive to 
missing data is Xi because this requires computation of the 
power spectrum. There are, however, existing algorithms to 
do this for unevenly sampled data (e.g, Lomb’s periodogram 
as implemented by Press & Rybicki (1989) and Press et al. 
(1992)), which we employ. 

Note that pre-processing gives rise to fast and power¬ 
ful neural networks, but it can also cause loss of potentially 
important information in the data. To check this, we can 
allow a neural network itself to perform the projection. This 
leads to much bigger neural networks which consequently 
take longer to converge. However, it does have the advan¬ 
tage that no assumptions are built in from the beginning. In 
this spirit, we experimented with a big neural network which 
takes as the two input layers the unadulterated flux mea¬ 
surements and errors at the sampling times and has ~ 200 
hidden neurons. Once converged, the performance of this big 
network is similar to the performance of smaller networks on 
pre-processed data. From this, we draw the conclusion that 
our pre-processing has not caused any serious degradation 
of information in the data. 


4.3 Training 

In training, the weights are initialised to random values. We 
then perform iterations to reduce the error function 

E n = -J2 (*" log y n + (1 - n log(l - y")) (7) 

n 

where t n and y n are the target and the response of the out¬ 
put neuron for the nth pattern. We have chosen this form 
of the error function (the so-called cross-entropy error func¬ 
tion) as appropriate for two class problems (see e.g., Bishop 
1995, section 6.7). Given our choice of activation 0 and er¬ 


ror functions 0 , the output y n approximates the posterior 
probability P( microlensing |inputs). 

The neural network must be able to generalise from the 
patterns in the training set, and not merely reproduce them. 
A worry is that the network will be over-trained and will re¬ 
produce structures of the decision boundary in unnecessary 
detail. To guard against this, we use early stopping as illus¬ 
trated in Figure |2| The performance of the network on the 
validation set is compared to that on the training set and 
the training stopped just before the error in the validation 
set rises. Another safeguard is provided by the introduction 
of a small amount of noise to the weights on each iteration, 
which guards against entrapment in a local minimum. 

If training is started from different initial weights, we 
converge to slightly different final weights. This makes it ad¬ 
vantageous to use a committee of 10 neural networks (Bishop 
1995, section 9.6). There are a total of 1500 lightcurves avail¬ 
able. For each member of the committee, the 1500 patterns 
are split in half randomly to give validation and training sets 
with 750 members. For each pattern, the final output is the 
average of the output of all 10 neural networks. 

The histogram of the output values for the combined 
validation and training set is shown in Figure 2] There is a 
very clean separation of microlensing events and other forms 
of variability. The non-microlensing events are strongly 
peaked at a probability y = 0, but there are a few events 
(~ 10) that extend up to y = 0.2. The microlensing events 
are strongly peaked at y = 1, although again there are a few 
(~ 10) that extend down to y = 0.7. The probability y = 0.5 
corresponds to the formal decision boundary (Bishop 1995, 
section 10.3). In fact, between 0.2 < y < 0.7, there are al¬ 
most no events in the histogram. If, when presented with a 
lightcurve, the neural network does give an output in this 
range, then the classification is in reality uncertain. This is 
because any error in the output can cause it to straddle the 
formal decision boundary. This range of outputs really cor¬ 
responds to patterns that are not present in the training and 
validation sets. This is valuable as it offers the possibility of 
the detection of unexpected and novel events in variability 
surveys. 

There are just 3 microlensing events out of 800 that 
are misclassified (i.e., have y < 0.5). These are scarcely visi¬ 
ble on the histogram. It is interesting to locate these events 
in our input space (see Figure 0. These events have input 
coordinates (-1.44, -1.07, -1.20, -1.17, -1.00), (-1.40, -1.16, 
-1.16, -1.14, -0.8) and (-1.32, -1.11, -1.0, -1.11, -0.8). They 
are small amplitude or short duration events dominated by 
noise, as indicated by the value of the xi input which mea¬ 
sures the presence of the signal. There is 1 false positive (i.e., 
a non-microlensing lightcurve with y > 0.5), which has coor¬ 
dinates ( — 1.5, —0.7, —1.23, —1.2, —1.1). This is a lightcurve 
from the MACHO tile which is probably neither microlens¬ 
ing nor variable star, but just noise. 

5 TESTS TOWARDS THE BULGE FIELDS 
5.1 Normal Events 

All MACHO lightcurves extracted with conventional PSF 
photometry (such as SoDoPhot) are now publically available 
(Allsman & Axelrod 2001). As a first test, we use lightcurves 
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Figure 2. This shows projections onto the principal planes of the five-dimensional space of inputs. Bold circles show the microlensing 
events and grey crosses the variable stars and noise in the training and validation sets. 



0 200 400 600 800 1000 1200 

Epochs 


Figure 3. This shows the value of the cross-entropy error function versus the epochs of training (number of iterations) for the patterns 
in the training and validations sets. The long-dashed line shows the point at which the training is stopped. The sum of errors is ~ 15 
out of the 700 patterns in the set. 
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Figure 4. This shows the histogram of output values for the 1500 patterns in the validation and training sets. Note the clean separation 
between microlensing and other types of variability. 


ID 

MACHO ID 

R 

B 

ID 

MACHO ID 

R 

B 

1 

97-BLG-24 

0.93 

1.00 

2 

95-BLG-5 

0.92 

0.96 

3 

97-BLG-42 

0.00 

0.00 

4 

97-BLG-s4 

0.72 

0.00 

5 

95-BLG-15 

0.00 

0.00 

6 

95-BLG-s8 

1.00 

1.00 

7 

97-BLG-18 

1.00 

1.00 

8 

96-BLG-26 

1.00 

1.00 

9 

97-BLG-38 

0.64 

0.47 

10 

97-BLG-58 

1.00 

1.00 

11 

96-BLG-l 

1.00 

1.00 

12 

97-BLG-2 

1.00 

1.00 

13 

96-BLG-14 

0.00 

0.00 

14 

95-BLG-s9 

0.99 

0.91 

15 

96-BLG-21 

0.68 

0.00 

16 

95-BLG-l 

1.00 

1.00 

17 

96-BLG-slO 

- 

0.16 

18 

96-BLG-20 

0.99 

1.00 

19 

96-BLG-10 

0.99 

0.90 

20 

95-BLG-4 

0.81 

0.02 

21 

95-BLG-23 

0.00 

0.00 

22 

95-BLG-sl3 

0.64 

0.05 

23 

95-BLG-10 

1.00 

1.00 

24 

97-BLG-4 

0.00 

0.00 

25 

97-BLG-16 

1.00 

0.23 

26 

96-BLG-8 

0.96 

0.99 

27 

95-OGLE-16 

0.99 

- 

28 

95-BLG-39 

0.16 

1.00 

29 

95-BLG-3 

0.39 

0.00 

30 

97-BLG-37 

1.00 

1.00 

31 

97-BLG-14 

0.21 

0.00 

32 

95-BLG-ll 

0.01 

0.00 

33 

96-BLG-31 

0.81 

1.00 

34 

96-BLG-sl6 

1.00 

0.83 

35 

97-BLG-sl4 

0.80 

0.90 

36 

95-BLG-22 

0.30 

0.79 


Table 2. This shows the output of the committee of neural networks on the subset of candidates towards the bulge in Alcock et al. 
(2000b) which are selected on the basis of the conventional PSF photometry package (SoDoPhot). The results of the analysis of the red 
and blue lightcurves are shown separately. The output is the probability that the event is microlensing. (Note that the red data for event 
17 and the blue data for event 27 are unavailable). 


from tile 18292 of field number 113, which lies towards 
the Galactic bulge. This tile contains ~ 5000 lightcurves 
of which one was identified by MACHO as a microlensing 
event. The MACHO data are taken at a site with moder¬ 
ate seeing. According to Alcock et al. (2000b), the median 
seeing is ~ 2.1 arcsec. This means that the quality of the 
data is sometimes quite poor. To allow for this, we clean 
the lightcurves by removing all isolated points with more 


than 3cr deviation from the immediately preceding and suc¬ 
ceeding datapoints. In general, this makes good sense as it 
removes outliers, but it can sometimes remove meaningful 
datapoints for very rapid brightness variations. 

Each cleaned lightcurve is shown to the committee of 
neural networks. The red and blue passband data are anal¬ 
ysed separately. In principle, it would be advantageous to 
analyze the red and blue data together because most variable 
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Red 


Figure 5. This shows the output of the committee of neural networks for all lightcurves in tile 113.18292, which is publically available 
from the MACHO project website (see Allsman & Axelrod 2001). Shown on the vertical and horizontal axes are the probabilities that 
the blue and red lightcurves are microlensing. There are ~ 5000 lightcurves on the tile, including one event BLG-95-1 identified by the 
MACHO collaboration as microlensing. This is shown as the black spot. 


stars show chromaticity differences. However, this option is 
not open to us at the moment because the publically avail¬ 
able colour information on variable stars is still quite limited. 
Figure 0 shows the results of the deliberations of the com¬ 
mittee. The probability of microlensing given the blue data 
is shown against the probability given the red data. There is 
only one pattern that satisfies this, namely the event identi¬ 
fied by MACHO as BLG-95-1. It is clearly and cleanly sep¬ 
arated from the rest of the patterns in the figure as a black 
circle in the topmost right corner. There is an additional pat¬ 
tern that has output values y ss 0.6 for both the red and blue 
data. This falls within the regime of novelty detection. Its 
input coordinates are (—0.6, —1.5, —0.8, —0.66, 1.26). It is a 
very long event since x$ = 1.26 is higher than typical values 
for microlensings. It falls into poorly-sampled region in Fig¬ 
ure 0 which suggests why this low signal-to-noise lightcurve 
was dragged into the microlensing range. Its lightcurve is 
shown in the upper panel of Figure [21 It is most probably 
a form of stellar variability that does not lie in the train¬ 
ing and validation sets. It is interesting to note that there 
are a number of lightcurves with output greater than 0.9 in 
one band, but not in the other. Shown in the lower panel 
of Figure |21 is a typical example, in this case securely iden¬ 


tified in blue (y > 0.95) but not in red (y < 0.05). The 
blue lightcurve does indeed look like a microlensing event, 
but the better sampling in the red passband shows a highly 
active many-humped lightcurve which is most probably an 
eruptive variable. 

As a second test, we analyze the lightcurves for all 36 
events in Alcock et al (2000b) that were identified on the 
basis of conventional PSF photometry. Table |5] shows the 
results of the poll of the committee. In each case, the out¬ 
put of the neural network on the red and the blue data is 
given. Of course, it is important to bear in mind that the 
MACHO group’s classification algorithm is itself probably 
not 100 per cent efficient. There are reasons to believe - 
both from the very high rate towards the Galactic Center 
that is incompatible with theoretical models of the Galaxy 
and from the differences between the MACHO and EROS 
results - that the subsample of candidates found by MA¬ 
CHO may have some contamination. There are total of 19 
events identified with a probability > 0.5 as microlensing in 
both the red and blue filters. In fact, these events are all 
beyond reproach as microlensing candidates as the proba¬ 
bility >0.9. Events 28 and 36 are securely identified in the 
blue data, but the red data is corrupted. Events 4, 9, 15, 
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Figure 6. The upper panels show the blue and red lightcurves for the event identified by the grey spot in Figure [2] The lower panels 
show the lightcurves for the event securely identified in blue (left panel), but not in red (right panel). In all cases, the horizontal axis is 
time in days, the vertical axis is flux in ADU/s. 


20, 22 and 25 are identified in the red data, but not in the 
blue. Lastly, there are 9 events for which no microlensing 
signal whatsoever is detected (event numbers 3, 5, 13, 17, 

21, 24, 29, 31, 32). We shall examine the lightcurves of some 
of these events shortly, but for the moment let us emphasise 
that there is no guarantee that the original identification by 
the MACHO collaboration was correct. 

Fig IZI shows the contours of probability for the training 
and validation sets in the input space. Light gray means that 
the probability is greater than 0.5 and corresponds to the 
formal decision boundary (see Bishop 1995, section 10.3). 
Dark gray means that the probability is greater than 0.9 and 
corresponds to almost certain microlensing. The irregularity 
of the contours is due to the fact that some regions are poorly 
sampled in the training and validation sets. The contours 
have been drawn with a view to guiding the eye. Superposed 
on the contours in Fig [7| are the events. The nine unfilled 
circles are those identified by the network as variable stars 
but by MACHO as microlensing events. The black circles 
are those for which both MACHO and the network agree as 
microlensing. 

There are a number of things to notice in the diagram. 
First, it is evident that the network has the ability to extrap¬ 
olate from the validation and training sets and assign rela¬ 
tive importance to the combinations of features extracted by 
the input variables. This is clear because there are events se¬ 
curely identified although they lie outside the contours (for 
example, event 18 is unambiguously identified despite ly¬ 


ing outside the probability contours in the two top panels). 
Second, the x± input is the only one for which explicit al¬ 
lowance has been made for noise and sampling. The network 
seems to assign greater importance to this input, as almost 
all the filled circles lie within the projected 90% probabil¬ 
ity contour. This suggests that further improvements may 
be possible by allowing for noise in the extraction of other 
input parameters (for example, using extirpolation for the 
correlation analysis). Third, the separation between the 0.5 
and 0.9 probability contours is typically very small, so the 
contour surface is very steeply rising. Such outputs can cor¬ 
respond to novelty detection. Accordingly, they occupy only 
a small region of the input space and so novelty detection 
occurs - as is highly desirable - for only a few lightcurves. 
The small separation between the contours provides justi¬ 
fication for the sizes of the training and validation sets. If 
there are too few patterns in these sets, then the separation 
would widen. Such widening happens in our network only 
in a few unimportant regions, which are physically inacces¬ 
sible (that is, such a combination of input variables gives 
rise to lightcurves that do not occur in nature). Fourthly 
, all the unfilled circles have xi < —0.8 and so lie in the 
noise-dominated regime. However, the values of xi indicate 
the presence of substantial positive excursions. This is al¬ 
ready enough to tell us that the noise in the MACHO data 
is strongly non-Gaussian. 

Figure |H| shows the lightcurves for 8 of the events cor¬ 
responding to the unfilled circles. For some of these events, 
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Figure 7. The grey-scale contours show the probability of microlensing in the input space (xi, ■ ■ ■ ,x$) as judged from the patterns in 
the training and validation sets. The circles show the locations of the microlensing events identified by MACHO using conventional PSF 
photometry. Filled circles designate the events also identified in the red filter by the network. Unfilled circles are not identified. Numbers 
within circles refer to our event designations in Table 2. (Light grey means that the probability is greater than 0.5, dark grey greater 
than 0.9). 


it looks as though there are secondary bumps (e.g., event 
3). For others, the bump is not properly contained (e.g., 
events 5 and 31) or the bump is overwhelmed by noisy data 
(e.g., events 13 and 17). It seems that the performance of 
our network is excellent, as these events certainly need to be 
looked at with care before accepting a classification as mi¬ 
crolensing. However, it is premature to conclude that MA¬ 
CHO have misclassified these events. This is because the 
MACHO group have re-processed all the lightcurves with 
difference image analysis (DIA) and this will improve the 
quality of the lightcurves, reducing noise and contamina¬ 
tion from nearby stars. However, without having the DIA 
lightcurves, we cannot confirm their verdict of microlensing. 


ID 

MACHO ID 

deviation 

R 

B 

1 

95-BLG-30 

f 

1.00 

1.00 

2 

96-BLG-12 

p 

1.00 

0.99 

3 

97-BLG-l 

b 

0.95 

1.00 

4 

97-BLG-8 

P 

1.00 

1.00 

5 

97-BLG-26 

P 

1.00 

1.00 

6 

96-BLG-3 

b 

0.83 

0.51 

7 

95-BLG-18 

P 

0.99 

0.75 


Table 3. This shows the output of the committee of neural net¬ 
works on the exotic events identified towards the bulge in Alcock 
et al. (2000b). These are all exotic events selected on the basis of 
the conventional PSF photometry package (SoDoPhot); f stands 
for deviations due to finite source size, p due to parallactic effects 
and b due to binarity. The output is the probability that the red 
and blue data correspond to a microlensing event. 
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Figure 8. This shows the datapoints for eight of the events classified as non-microlensing by the network and as microlensing by the 
MACHO collaboration. The vertical axis is flux in ADU/s and the horizontal axis is time in days. The data is presented as four strips 
of 7 month sequences; the 5 months when the bulge is not visible from Australia is marked by the vertical dashed lines. 


5.2 Exotic Events 

Some microlensing lightcurves can show deviations from the 
standard Paczynski form caused by parallactic or finite- 
source size effects or by binarity and so on (see e.g., Mao 
& Paczynski 1993, Mao & Di Stefano 1995, Kerins & Evans 
1999, Mao et al. 2002). In Table [3] all the exotic events 
identified in Alcock et al. (2000b) using the SoDoPhot pho¬ 
tometry package are processed with the committee of neural 
networks. 

Parallactic events (like 96-BLG-12) occur when the Ein¬ 
stein radius projected onto the observer’s plane is of the 
order of an astronomical unit. In such a circumstance, the 
changing motion of the Earth around the Sun during the 
event is detectable by an asymmetry in the lightcurve with 
respect to the peak. Events showing deviations caused by 
finite source size (like 95-BLG-30) occur whenever the an¬ 


gular size of the source is of the same order of magnitude as 
the angular Einstein radius. They are usually flatter-topped 
than the classical Paczynski curves for microlensing by a 
point source. For both these kinds of deviation, the commit¬ 
tee of neural network performs well, as shown in Table [3 
All the parallactic and finite source size events are identified 
as microlensing. 

However, binarity can cause much substantial devia¬ 
tions. For example, strong binary events have additional 
peaks, although these can sometimes be missed if sampled 
irregularly. Weak binary events may just have distortions 
to the peak or the wings of the lightcurve. Accordingly, we 
might expect the detection of binary lightcurves to require 
the training and testing of a new neural network. This is 
supported by the results in Table [3] Here, 96-BLG-12 is 
identified by the committee, whereas 96-BLG-3 falls into 
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the domain of novelty detection. It is reassuring that in the 
former case, the event is recognised, while in the latter case, 
the event is recognised as a new phenomenon. The devel¬ 
opment of software to recognise binary events is a problem 
that has not been fully solved by any of the microlensing 
collaborations to date. It seems reasonable to expect neural 
networks to play a powerful role here. 


6 CONCLUSIONS 

This paper has devised a working neural network that can 
distinguish simple microlensing lightcurves from other forms 
of variability, such as eruptive, pulsating, cataclysmic and 
eclipsing variables. The network is structured to have five in¬ 
put neurons and one output neuron. The inputs and output 
are separated by a layer of hidden neurons. The simplicity of 
the network means that it can be trained very quickly and 
it can be used to process huge datasets in less than a sec¬ 
ond. Each lightcurve is pre-processed to provide five inputs 
to be fed to the network. In our application, the five inputs 
were chosen on physical grounds as good discriminants for 
microlensing. In other applications, different input variables 
may be optimum. Our network has been constructed so that 
the output is the posterior probability of microlensing. 

We believe that neural networks offer three important 
advantages over conventional techniques using in microlens¬ 
ing experiments. First, the decision boundary separating 
microlensing from non-microlensing may be rather compli¬ 
cated. At present, all microlensing collaborations use a series 
of cuts (for example, on the goodness of fit to a Paczynski 
curve, on achromaticity and so on). This is the crudest form 
of the decision boundary. However, even simple neural net¬ 
works can reproduce complicated decision boundaries and so 
the technique is both more efficient and more flexible. More¬ 
over, once a lightcurve has failed to pass a cut at the early 
stages of a conventional selection process, it is lost for any 
further analysis. But, neural networks assign relative impor¬ 
tance to the input parameters, thus the decision is based on 
the whole of the information available. 

Second, neural networks offer a superior way of calcu¬ 
lating the event rate avoiding the need for any kind of ef¬ 
ficiency calculation. The classical procedure of identifying 
events with cuts is inefficient, and this necessitates the cum¬ 
bersome Monte Carlo calculation of the numbers of synthetic 
events passing the cuts. However, a properly-designed neural 
network can reproduce the decision boundary well and can 
enable the event rate to be computed directly for compar¬ 
ison with theoretical models, thus completely sidestepping 
the need for any Monte Carlo calculation of the efficiencies. 

Third, novelty detection is made both more precise and 
easier by neural networks. The conventional approach relies 
on examination by eye of the events left over after applying 
a sequence of cuts. For our neural network, we have argued 
that all lightcurves with outputs between 0.2 and 0.7 may 
be examples of lightcurves not contained within the training 
set. These are the events which need looking at very care¬ 
fully. In the even more massive datasets of the future, it will 
be important to identify possible novel events as quickly and 
as efficiently as possible. 

From the point of view of microlensing, it is interest¬ 
ing to extend the work in this paper to include additional 


effects. Some of the ongoing microlensing experiments are 
working in the highly blended regime. For example, the 
POINT-AG APE (Paulin-Henriksson et al. 2002a,b), We- 
CAPP (Riffeser et al. 2001) and MEGA (Crotts et al. 2000) 
collaborations are all monitoring the nearby galaxy M31. 
Here, the individual stars are not resolved, so the flux in 
a pixel or superpixel is followed (Baillon et al. 1993). The 
range of lightcurves in such pixel lensing experiments is very 
wide - for example, microlensing events can occur in the 
same superpixel as bright variable stars (e.g., the event PA- 
99-N1 described in Auriere et al 2001). So, the identification 
of microlensing events becomes still more daunting. As the 
complexity of the pattern recognition task increases, so we 
expect the power and flexibility of the neural network ap¬ 
proach to pay increasing dividends. Also, in this paper, we 
have concentrated on the microlensing datasets towards the 
bulge, for which the source stars are often bright. It is im¬ 
portant to apply our techniques to the microlensing events 
towards the Large Magellanic Cloud. Here, the task is harder 
as the source stars are fainter and there is serious contam¬ 
ination from supernovae in background galaxies. This work 
will be the subject of a separate publication. 

Although our application has been strongly focused on 
microlensing, the technique is of general applicability in as¬ 
tronomy. There are numerous ongoing or planned massive 
photometry surveys using robotic telescopes (ROTSE), wide 
field cameras (WASP and VISTA) and space-borne satel¬ 
lites (GAIA and Eddington). Although the goal of the sur¬ 
veys is different, the basic method is the same - brute force 
search through many terabytes of data for interesting but 
rare events, whether planetary transits, cataclysmic vari¬ 
ables or optical flashes. We envisage such tasks being rou¬ 
tinely devolved to neural networks in the astronomy of the 
future. In each case, cascades of neural networks could be 
trained to filter and identify the various classes of variable 
stars, to pinpoint the target events of interest and to isolate 
the unexpected or new classes of phenomenon which need 
looking at very carefully. 


7 SPECULATIONS 

Suppose the goal is to monitor the whole sky for variability 
at short time intervals down to 20th magnitude (roughly a 
billion objects in our Galaxy). In this speculative final sec¬ 
tion, we ask what is possible now and what will be possible 
by 2010? 

Let us consider the simple situation of a single neu¬ 
ral network program running on a single computer. The 
middle-range hardware situation today is typically a proces¬ 
sor running at 2200 Mhz (corresponding to approximately 
1000 MIPS or Million Instructions per Second). In order to 
predict the situation in 2010, we can use “Moore’s Law”, 
which says that the numbers of transistors in a processor 
chip doubles every year or so. Thus, by 2010 the processor 
speed should be ~ 100 000 MIPS, compared to about 1000 
MIPS today. But, to evaluate the progress in run time, we 
must consider both hardware and the compiler. Benchmark¬ 
ing programs such as SPEC provide us with some clues as 
to what will be achieved in 2010. If we look at the evolu¬ 
tion in performance results on SPEC tests for computers 
between 1995 and 2000, we find an approximate speed-up 
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factor of 16, or roughly 1.74 per year. We can extrapolate 
this progress over the 2002-2010 period, which gives a speed¬ 
up factor of 85. In other words, both Moore’s Law and the 
extrapolation of benchmarking suggest rather similar speed¬ 
up factors of roughly two orders of magnitude by 2010. 

The time required to run the neural network itself is 
negligible compared to the time required to run the pre¬ 
processing, which extracts the parameters used by the neural 
network. Our present pre-processing program requires 10” 4 
s to analyse 100 data points for a single star. We have chosen 
100 datapoints as it might correspond to sampling 3 times 
a night for one month, which is reasonable for the detection 
of fast transient events. Alternatively, it might correspond 
to sampling once a night for 3 months, which is reasonable 
for the detection of variability like microlensing with a char¬ 
acteristic timescale of ~ 1 month. At present, it therefore 
takes ~ 10 5 s (or over a day) to analyse such dataset for the 
whole sky. By 2010, it will take only 20 minutes for such a 
program to run on the whole sky (using the speed-up factor 
of 85). More generally, the time taken in seconds to analyse 
a set of IVpts data points for IV* stars in 2010 is 

t ~ 2 x 10“ 9 AT*A/p ts log IVpts ■ (8) 

Let us assume there are 8 hours of observing time a night 
and that we wish to process a months data for the whole 
sky in real-time. Then we can derive the real-time equation 

t 2 ~ 1.7 x 10® (13.7 -log t). (9) 

which has a solution t ~ 50 minutes. In other words, real¬ 
time processing of variable phenomena across the entire sky 
down to 20th magnitude will be possible for sampling rates 
of > 1 hr by 2010. 

Our speculative calculation errs on the pessimistic side 
because we have not taken into account any correction for 
application of parallel processing or the fast developing 
GRID technology for high performance computing. How¬ 
ever, it surely does enough to convince the reader that, prop¬ 
erly trained, neural networks can analyse huge datasets very 
quickly. This will become one of the methods of choice for 
data-mining in the massive variability surveys of the very 
near future. 
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